Generalized-Log Spectral Mean Normalization for Speech Recognition

نویسندگان

  • Hilman Ferdinandus Pardede
  • Koichi Shinoda
چکیده

Most compensation methods for robust speech recognition against noise assume independency between speech, additive and convolutive noise. However, the nonlinear nature distortion caused by noise may introduce correlation between noise and speech. To tackle this issue, we propose generalized-log spectral mean normalization (GLSMN) in which log spectral mean normalization (LSMN) is carried out in the q-logarithmic domain. Experiments on the Aurora-2 database show that GLSMN improved speech recognition accuracies by 20% compared to cepstral mean normalization (CMN) in mel-frequency domain.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Feature normalization based on non-extensive statistics for speech recognition

Most compensation methods to improve the robustness of speech recognition systems in noisy environments such as spectral subtraction, CMN, and MVN, rely on the fact that noise and speech spectra are independent. However, the use of limited window in signal processing may introduce a cross-term between them, which deteriorates the speech recognition accuracy. To tackle this problem, we introduce...

متن کامل

Speech Recognition by Denoising and Dereverberation Based on Spectral Subtraction in a Real Noisy Reverberant Environment

A blind dereverberation method based on spectral subtraction using a multi-channel least mean squares algorithm was previously proposed. The results of a large vocabulary continuous speech recognition task showed that this method achieved significant improvements over the conventional method based on cepstral mean normalization and beamforming in a simulated reverberant environment without addi...

متن کامل

Improved feature enhancement using temporal filtering in speech recognition

The difference between training and testing environments is the major reason of performance degradation of speech recognition. In this paper, to further decrease the mismatch, we apply temporal filtering, Auto-Regression and Moving-Average (ARMA) filtering or RelAtive SpecTrAl (RASTA) filtering, as a post-processor for the log-Energy dynamic Range Normalization-Cepstral Mean and Variance Normal...

متن کامل

A Log-energy Scaling Normalization Scheme for Robust Speech Recognition

The log-energy parameter, as an auxiliary but influential feature, has been commonly used to augment Mel-frequency cepstral coefficients (MFCCs) to improve the recognition accuracy in automatic speech recognition (ASR). In this paper, a new and effective scaling approach named log-energy scaling normalization (LESN), which utilizes special nonlinear scaling functions on noisy speech data for lo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011